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DIFFERENTIAL CORRECTION SCHEMES 


IN NONLINEAR REGRESSION 

Henry P. Decell, Jr. 
University of Houston, Houston, Texas 

F. M. Speed 

Texas A & I University 


Abstract 

This paper briefly reviews and improves upon classical iterative 
methods in nonlinear regression. This is accomplished by discussion of 
the geometrical and theoretical motivation for introducing modifications 
using generalized matrix inversion, other than but in the same general 
vein as those discussed by Fletcher [6]. Examples having inherent pitfalls 
described in [8], [12] and others are presented and compared in terms of 
results obtained using classical and modified techniques. The modification 
is shown to be useful alone or in conjunction with other modifications 
appearing in the literature. 



Introduction 


Following for convenience the notation of [8] , let y^_ 
set of n responces of the form 

y t = f t (8) + e t » t = 


where the response function f ^ (0 ) is a known function of t 
undetermined vector 0 = ). We will call the vector 

A 

squares estimate (given the n responses) of 0 provided 0 


Q(0) = I (y. - f (0)) : 

t=i 


The vectors are defined 


Q , (e) . lMe» 


R(e> - (y t - f t (6)> 


and the matrices 


f'(0) = 


3 (f (0) ) T 

k-> 


Q"(0) = 


nr 3Q(6K 
30 . ' 

J 

30 . 


2 


denote a 


and an 

A 

0 a least- 
minimi zes 



Three of the most common differential correction schemes for 


estimating the parameter vector 0 are the steepest descent method, the 
quadratic approximation, and the Gauss-Newton method, with corrections 
respectively given by 

A0 = -aQ'(0) , a > 0 

A0 = — (Q * ’ (0)) -1 Q’ (0) 

A0 = -l/2(f'(0) T f'(0))" 1 Q , (0) . 

These methods have their advantages and disadvantages. Of the 
three, the Gauss-Newton method is probably most popular. 

The authors of [8] present a modification of a classical method and 
state that "The step A0 will in general be distinct in both length and 
direction for each of the three methods." This is not necessarily the case 
from a computational point of view since the matrices to be inverted may be, 
for all practical computational purposes, singular; yet the system of 
equations may have infinitely many solutions. For example, the Gauss- 
Newton correction requires the solution of the equation 

f'(0) T f'(0)A0 = f ' (0) T R(0) 

since 

-1/2Q' (0) = f ' (0) T R(0) . 
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It is known that any equation of this form (i.e., of the form 
T T 

A Ax = A z, the normal equations of the least-squares problem: minimize 

T 

(Ax-z) (Ax-z) given A and z) always has at least one solution and 
perhaps infinitely many. We will try to point out the significance and 
consequences of these solutions in terms of their relationship to 
differential correction schemes. 


The Generalized Inverse 

A few basic concepts regarding generalized inverses important to 
the development follow. 

Theorem 1 . The four equations AXA = A, XAX = X, (AX)* = AX, and 
(XA)* = XA have a unique solution X for each complex m*n matrix A. 
This solution X is called the generalized inverse of A and is denoted 
by X = A + . 

This theorem is due to Penrose [10] and is equivalent to the 
apparently more geometric characterization of the generalized inverse 
of A which follows. 


Theorem 2 . The generalized inverse A of A is the unique solution 
of the equations 


AX = 


P R(A) 


XA = 


P R(X) 
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where P 


, respectively, denote the perpendicular 


R(A) and P R(X) 

projection operators on the range spaces (column spaces) of A and X. 

In any case, it is easy to see that if A is square and non- 
singular, then A + is the ordinary inverse of A. Much work has been 
done recently in the area of generalized matrix inversion, including 
theoretical developments and computational techniques, rendering it a 
very useful tool in matrix theory and applications. A rather exhaustive 
bibliography concerning applications of generalized inverses can be found 
in [2], [3], and [13], We will not develop the details of the basic 
concepts, but rather state an important theorem regarding the solution of 
matrix equations in general. 

Theorem 3 . The matrix equation AXB = C has a solution X if and only 
if AA + CB + B = C, in which case all solutions are given by 

X = A + CB + + S - A + ASBB + 

where S is an arbitrary matrix having the dimensions of X. 

T T 

The Equation A Ax = A z 

As stated earlier, the Gauss-Newton method involves the solution 
of an equation of this type at each iteration. The following corollary 
to Theorem 3 will give some insight to a possible course of action one 
could take at those times during the iteration process when the matrix 
f'(9) f * ( 0) (or perhaps even a matrix such as Q"(0) in another method 



requiring inversion for the calculation of the correction A8) is 
actually or nearly singular. For the purpose of this paper, we will 

describe how generalized inversion can be useful in iterative techniques 

, . T T 

requiring the solution of equations of the form A Ax = A z. 

Corollary 1 . If A is any m x n matrix and z is any m x l vector, then 
T T 

the equation A Ax = A z has at least one solution and all solutions are 
given by 

X = A + z + (I - A + A)y 

where y is arbitrary having the dimensions of x. 

The proof of Corollary 1 is an immediate consequence of Theorem 3 
and fact that (A T A) + A T = A + [10]. 

T T + 

Corollary 2 . Among the solutions of A Ax = A z, the solution x = A z 
has the smallest Euclidean norm (henceforth "norm" will be denoted | | • | | ) . 

The proof of Corollary 2 follows from the facts that I - A + A is 
the orthogonal projection operator on the orthogonal compliment of the 
range space of A + and hence that A + z and (I - A + A)y are orthogonal 
for every y. In fact, 

| |A + z + (I - A + A)y|| 2 - | | A + z | | 2 + | | (I - A + A>y|| 2 

>||A +Z || 2 . 
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The significance of Corollary 1 is that there may be infinitely 
many possible corrections A0 satisfying an equation defining a 
differential correction scheme in the presence of a singular or, in the 
computational sense, nearly singular coefficient matrix. There is a 
tendency to disregard or remain unaware of these solutions and, with the 
inability to invert the coefficient matrix, to look for new or modified 
techniques such as those found in [1], [5], [8], [9], and [12], For 
example, in [7] Jennrich and Sampson medify the coefficient matrix by 
selected rows and columns. In [8], Marquardt changes the diagonal of the 
coefficient matrix. It has been our experience that these solutions should 
be given careful attention in the case of what will hereafter be called an 
apparent (i.e., actual or computational) singularity. 

Fletcher [6] points out that in the generalized least-squares 
(Gauss-Newton) or Newton methods "... A most important property of the 
generalized inverse formulation is that in all circumstances (i.e., full 
rank or not), even when the generalized least-squares method would fail, 
the directions of search generated are downhill and so an imporvement can 
always be made to the sum of squares ( assuming the approximation is not 
already a stationary point)." In this connection, the significance of 
Corollary 2 is that there is a reasonable way to choose a correction A0 
satisfying the defining equations of the scheme whenever an apparent 
singularity occurs. We propose to choose the minimum Euclidean norm 
correction A + z (i.e., the correction of shortest length consistent with 
the correction equation) . It has been our experience that in nonlinear 



equations other solutions can result in failure of convergence. 

The suggested correction certainly depends upon the algorithm used 

to calculate A + and the actual computational way in which the algorithm 

T 

establishes that A is not of full rank (i.e., A A singular). Of course, 
this is intimately connected with near-zero tests in the algorithm, 
sensitivity to dependent columns or rows, conditioning, and so forth. We 
should further point out that, for a general differential correction scheme 
of the form M(0)A0 = z(8), the choice of the correction should be 
A0 = M(0) + z(0) if there is at least one solution for A0. Of course, 
according to Theorem 3 there will be at least one and possibly infinitely 
many solutions A0 if and only if M(0)M(0) + z (0) = z(0). Moreover, if 
there is one and only one solution, then that solution is indeed given 
by A0 = M(0) + z(0). 

T 

For example, in the Gauss-Newton method, M(0) = f'(0) f'(0) and 

z(0) = f’(0) T R(0) so that A0 = M(8) + z(0) = (f * (0) T f ' (0)) + f ' (0) T R(0) = 

f'(6) + R(8)* Even if M(0) is nonsingular, then (f ' (0) T f ' (0)) + = 

T -1 

f T (0) f f (0) ) , and either form of A0 may be used in calculations: 

A0 = (f 1 (0) T f ’ (0)) _1 f ’ (8) T R(8) = f'(0) + R(0) • 

In other words if M(0) is square and computationally nonsingular, the 
classical correction is, in fact, the minimum norm, correction. We will 
not discuss the comparative aspects of computing A0 in a correction 
scheme such as the Gauss-Newton method by one or the other of the 


theoretically equivalent formulas: 


(1) A0 = (f'(0) T f , (0)) + f f (6) T R(6) 

(2) A6 = f'(0) + R(0) 

Calculations in our examples use (2) . 

We have had unusual success with this technique in many practical 
problems too numerous to mention here. In many cases, one definite 
advantage seems to be the ability to continue making corrections of 
reasonable length and perhaps, as in the Gauss-Newton case, reasonable 
direction through regions in which the coefficient matrix M(0) behaves 
badly. We do not propose this technique as a cure-all but rather that it 
should be included among other useful techniques in nonlinear regression. 

A few examples having known pitfalls will be presented in the next section. 

Examples . 

In the following examples, the residual sum of squares Q(0) will 
be presented in tables by iteration number. The values of Q(0) for the 
methods cited will be those values tabulated in the references cited. Some 
authors divide Q(0) by the degrees of freedom. For clarity and easy 
comparison we indicate this division in the tables when necessary. Finally, 
the residual sum of squares given by the method of this paper (minimum norm 
correction) will be noted MN; Q(0). 



Results of the method of this paper compared with those of the 
Modified Davidon Method (MDM) used in [12] to find the parameters of an 
exponential model discussed by Hartley in [7] are given in Table 1. 

Table 1 

Exponential Model (Hartley) 





Iteration 

MN; Q(6) 

MDM; Q(0) 

0 

27376 

27376 

1 

14586 

20127 

2 

13779 

15412 

3 

13408 

13552 

4 

13394 

13485 

5 

13390 

13449 

6 


13425 

7 


13394 

8 


13393 

9 


13390 


10 




A second exponential model given by the authors of [8] points out 
a failure of Hartley's method [7] due to a singular partial derivative 
matrix. In [8] a stepwise regression scheme (SR) is successfully utilized 
for this example. The results of the (SR) scheme compared with those of 
the method of this paper are given in Table 2. 

Table 2 

Exponential Model - Singular Partials 





Iteration 

MN; Q(6)/8 

SR; Q(0)/8 

0 

521.41 

521.41 

1 

429.84 

429.84 

2 

39.11 

88.15 

3 

15.765 

83.74 

4 

15.545 

* 

10 


21.33 

30 


15.545 


*The value of SR: Q(8) was not tabulated in [8] 

for this iteration. 

Another six-parameter exponential model having inherent singularity 
problems is presented in [12] using the Modified Davidon Method (MDM) . 

A comparison of the results using the technique of this paper is given 
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in Table 3. 


Table 3 

Six Parameter Exponential Model - Singular Partials 


Iteration 

MN; Q(0) 

MDM; Q(0) 

0 

21.38 

21.38 

10 

.873 

2.39 

20 

.792 

1.99 

30 

.396 

1.77 

40 


1.59 

50 


1.41 

60 


.90 

70 


.41 

80 


.407 


Concluding Remarks 


We have taken the liberty to exclude a reproduction of the detailed 
description of our example models. These models are thoroughly treated in 
[7], [8] and [12], The tables give some indication of rates of convergence 
and a comparison of residuals only. We do not wish to leave the impression 









that iteration counts are comparable. For example, one Gauss-Newton 
iteration could have been equivalent to p conjugate direction steps 
for the matrix inversion employing the Davidon method. 
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